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(54) Abstract Title 

Converting a standard logic execution unit to perform SIMD operations 



(57) Microprocessor circuit 10 includes a standard 
execution unit which executes instruction 14 by 
performing a standard operation (e.g. an ALU operation or 
a Shift operation) on operands in registers 12. A correction 
circuit 20 modifies the results from the standard operation 
when instruction 14 is an SIMD (Single Input Multiple 
Data) instruction, and is bypassed for non-SIMD 
instructions. Arithmetic operations are corrected by 
operating on the results based on the significant bits and 
carry bits at the boundaries between the operand subsets 
(e.g. of Word or Double word size) on which the SIMD 
instruction operates (fig. 4). With Shift operations, a mask 
generator (76, fig. 5) is used in parallel with the Shifter (68, 
fig. 5) and modification of the standard shift results is 
performed by means of an address overlay mask. Use of 
the correction circuit 20 thus enables SIMD operations to 
be performed without the need for additional execution 
units. 
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This invention relates generally to data registers in microprocessor circuitry and more 
particularly, to a Single Instruction Multiple Data (SIMD) correction circuit for modifying 
the results of an arithmetic/shift operation. 

Heretofore, logic circuits have been proposed to improve performance of arithmetic/shift 
operations in data processing. With the increasing need for processing large amounts of 
data at ever increasing speed, improved efficiency of arithmetic/shift operations is very 
important. In particular, one of the difficulties of Multi -media, especially relating to 
graphics, is the large number of data that must be processed. An attribute of the Single 
Instruction Multiple Data (SIMD) is that each SIMD instruction can perform an operation 
on each 8 bit. 1 6 bit, 32 bit, or 64 bit field of a 64 bit operand independently. 

A SIMD ADD, for example, would perform an add on the first, second, third and fourth 
16 bit section of the register operands as if the SIMD ADD were 4 independent 16 bit add 
instructions. A SIMD SHIFT, for example, would perform a shift on the first, second, 
third, and fourth 16 bit section of the register operands as if the SIMD SHIFT were 4 
independent 16 bit SHIFT instructions. Also, the SHIFT operations include shift left, 
shift right logical, shift right arithmetic. 

SIMD has gained recent popularity with the announcement of the Intel MMX Extension. 

The MMX is a SIMD architecture. Implementing MMX extensions of the X86 
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archi.ec.ure costs addition* execution uni«s dedicated to the MMX fonna, Convening a 
SIa „dard execution unit to perform both standard and SIMD operations induces 
dir1icu,ties that have no. heretofo, been addressed. First, additional execution units adds 
delay to critical paths such as carry propagate paths since in S1MD the carry between 
SIMD sub-operands (16 bit or 32 bit sections, must be suppressed. Second, additional 
execution units requires additional si.tcon real estate .area). Thud, additional execution 
units increases the development time and cost because the execution units are highly 
specialised circuits. 

The present invention includes a correction circuit to convert a standard logic execute 
un,. to perform both standard operations and SIMD operations. This invention has the 
advantages of alleviating the difficulty of delay to critica! paths such as carry propagate 
paths; rearing less sil.con space titan if additional execution units were added; and does 
not require developing further highly specialised execution units. Execution units are 
hiehly specialised and thus to make a change to one is very labour intensive. This 
invention tmproves the performance of processing large amounts of data in applications 
such as Multi-media and signal processing. 

Th,s invention eliminates the cntical paths, because logic is no. added in the critical 
paths; saves time and silicon real estate because the same execution unit is being reused; 
^ does no, require the restructure of a comp.ex uni, wuh added logic therein. 



Another operation that requires processing large amounts of data which has application in 
both Multi-media and signal processing is matrix multiply. The present invention may be 
used on the standard logic of arithmetic/shift operations (e.g., ADD. SUBTRACT, 
DIVIDE. MULTIPLY, SHIFT) of Arithmetic Logic Units (ALUs) and Shift logic. 

This invention includes a microprocessor circuit having an execution unit for execution 
of standard instructions in an arithmetic/shift operation and a correction circuit responsive 
to the execution unit for modifying the standard instructions provided by the execution 
unit to results required by a SIMD instruction being executed. This improves efficiency 
of the operations because the correction circuit modification may be performed in a 
second cycle and the arithmetic/shift operation is free to execute a second instruction in 
the second cycle. The arithmetic/shift operation results from an instruction provided by 
either an .Arithmetic Logic Unit (ALU) or by a shift function. The correction circuit 
passes data, unchanged for standard logical instructions but provides condition codes 
according to the SIMD instruction. The correction circuit corrects arithmetic operations 
by operating on standard data based on significant bits and carry bits for sub-unit 
boundaries. 

In the case of a Shift operation, this invention includes a Shifter performing standard 
operations on instructions in a first cycle of operation, a mask generating circuit in 
parallel with the Shifter circuit, and a correction circuit responsive to the Shifter and the 
mask generating circuit for modifying the standard results provided by the Shifter to 



resuks required by a SIMD Shift operation being executed. This modification may be 
performed by an address overlay mask in a second cycie operation and the Shifter is free 
to execute a second instruction in the second cycle. 

How the invention may be carried out will now be described by way of example and with 
reference to the accompanying drawings in which like designations denote like elements, 

and: 

FIG. I depicts a block diagram showing an execution unit for performing an 
• arithmetxc/shm operation and correction circuit of a first preferred embodiment in 
accordance with the present invention. 

FIG. 2 depicts a block diagram showing an execution unit for performing an 
arithmetic/shift operation unit and correction circuit of a second preferred embodiment in 
accordance with the present invention. 

FIG. 3 depicts a high level view of a SIMD operation in accordance with a preferred 
embodiment of the present invention. 

FIG. 4 depicts a flow diagram disclosing a correction unit detail in accordance with a 
preferred embodiment of the present invention. 

FIG. 5 depicts a flow diagram of a shift function in accordance with a preferred 
embodiment of the present invention. 

FIG. 6 depicts mask generator detail in accordance with the present invention as 
depicted in Fig. 5. 



FIG. 7 depicts AND/OR mask (AOM) detail in accordance with the present invention 
as depicted in Fig. 5. 

FIG. 8 depicts a block diagram showing an execution unit/correction unit for a third 
preferred embodiment in accordance with the present invention. 

Referring to Figure 1 , a microprocessor circuit 1 0 of the present invention is shown with 
a correction circuit ZO.The correction circuit 20 modifies the results of either standard 
operations of an ALU (Figure 4) or of standard operations of a Shifter (Figure 5-7) to 
results required by a SIMD instruction being executed. Standard execution unit 16 
receives an instruction to be executed from a reservation station (s 1 ) 1 5 which holds the 
instruction 14 to be executed. The execution unit 16 provides access to registers 12 and 
performs either an ALU operation or a shifter operation. Finish stage (s2) 18 holds 
results to be written when the instruction 14 completes in the write-back stage. 

A correction circuit 20 is shown including correction unit 22 and finish stage (s3) 24 for 
SIMD applications. In a first embodiment in Figure 1 , the execution unit is depicted as a 
two-stage pipeline. The first stage 1 7 to the pipeline enters the correction circuit 20 and 
the second stage 19 to the pipeline bypasses the correction circuit 20 to MUX 26 for 
non-SIMD operations. The results to registers 28 of the two pipelines 17. 19 are provided 
to the registers 12. 



Figure 2 depicts a correction circuit 30 of a second embodiment of the present invention, 
this embodiment is similar to Figure 1 except that the MUX 26 is eliminated, permitting 
a S1MD or correction operation in the same cycle as a non-SIMD operation. A SIMD 
correction instruction is passed through flow line 29 and a non-SIMD instruction 
bypasses the correction unit 22 through flow line 31. 

In vet another implementation Figure 8 shows a combined execution unit/correction unit 
100. The correction stage could be performed by the ALU or shift operator of the 
execution unit'correcuon unit 100 by feeding the result 17. 101 of the first pass of the 
ALU or shift operator stage back to the ALU or shift operator with appropriate control 
hardware added. All of these implementations become obvious to one skilled in the art 
when taught the present invention and are therefore claimed by this disclosure. 

SIMP Implementation TUincr Standard ALU 

Figure 3 shows a high level view of a SIMD operation 40. .An instruction 14 is depicted. 

A single SIMD instruction 44 performs operations simultaneously on subsets 7. 6, 5. 4. 3. 

2. I. 0 of the registers 12. As shown. Register operand (R2) 44 points to a 64 bit register 
12. The operand instruction OP specifies the operation, for example, Rl < -Rl - R2 
ADD instruction. The instruction 14 will perform independent adds on each subset 7 
through 0. 
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Each subs*. 7 through 0 could be a single byte such that 8 independent adds are 
performed by t he action using each of the 8 bytes m the 64 btt register operand, or 
the subset could be 16 or 32 bits. 

Referttng to Figure 4, the detail of the corrects un.t 20 of Figures 1 and 2 are depicted 
for use with a standard ALU. A 64 bit execute unit 1 6 has an eight byte result register 
24. B7 is the high order byte, B(6),... with B(0) being the low order byte. Each of these 
bytes has three addit,o„al latches C(n>. CKn). and ZB(n,. C(n) ,s the carry ou, of the high 
order bit of B(n) tthe carry out of the byte), Cl(n) ts the carry into the high order bit of 
B(n). and ZB(n) indicates that all bits of B(n) are zero. 

Each of these BfcO registers is connected to a box labeled F(n>, the force bo,. F(n> has 
two input control lines plus the date from B<n>. The input control lines are Force 8(n) and 
Force 0(n). Force 8(n) forces hex '80' on the output FB(n) of thus box while Force 0(n, 
forces hex 00'. If both control lines are off then the input bus is passed to the output bus 
unmodified. The FB(n) byte output goes to a increment/decrement box ID(n). This box 
has two control inputs IB(n) and DB(n). IB(n) increments the input bus by one and DB(n) 
decrements the mput bus by one. Both control lines being off passes the input bus to the 
outpu, bus Rln) unmodified. ID(n> thus has a byte output bus and a output control l.ne 
CB,n, This outpu. is the C ARRY/BORROW NOT line for the byte increment decrement 



10 



15 



That is. it should be active when DB(n) is active AN D FB(n)=X ' 00' , OR when IB(n) is 
active FB(n)=X'FF\ 

The following explains examples of the operations to be performed. While all 
combinations of these functions are not possible in the current SIMD definition the 
present invention contemplates all possibilities. 

A number of instruction parameters are provided during the correction stage. First. 
Byte(B). Word (W). Double word(DW), and Quadword(QW) data size (where a word is 
16 bits) is provtded. Second, an ADD or SUBTRACT instruction is provided. Third, the 
instrucuon may be in signed or unsigned format. Signed numbers are standard two's 
complement format, while unsigned assumes only positive numbers. Fourth, these 
instructions are performed with either saturation or without. Without saturation means 
that results wrap if they exceed the specified size. With saturation means that if the 
results exceed the size then the largest or smallest number possible given the format is 
inserted. For example, in the case of an unsigned byte an overflow produces a result of 
225 or hex 'FF' while an underflow (for subtract only) produces a result of 0 or hex -00\ 
For signed byte an overflow produces a result of 1 27 or X7F" while an under flow 
(possible with add and subtract) produces a result of -128 or hex -80'. 
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The structure defined in Figuresl-4 allow for implementation of correction for all 
combinations of the above instruction parameter scenarios. Note that while the F-box 
produces the XW and XW directly the constants X'FF* and X<7F< are produced 
indirectly taking the X'OO' and X'80' and decrementing by one. 

In the simplest scenario ADD UNSIGNED BYTE (no saturation). The low order byte 
B(0) needs no correction. The next by« B(l) needs to be decremented by one if C(0)=l . 
Thus DB(1)=C(0). In general B(n) needs to be decremented by one if C(n-l)=l or 
DB(n)=C(n-l). 

For the ADD UNSIGNED WORD (no saturation) scenario, the result B(1)/B(0) needs no 
correction. The next word B(3)/B(2) needs to be decremented by one if C(l)-1. This 
means that DB(2)=C(l) and that DB(3)=CB(2). In general then DB(n)=C(n-l) and 
. DB(.ri^l)=CB(n) where n=2. 4. 6. 



ADD UNSIGNED DOUBLE WORD follows a similar pattern while ADD UNSIGNED 
QUADWORD needs no correction. It should also be obvious that the above pattern is 
15 identical for ADD SIGNED (no saturation). 



10 



9 



In ADD UNSIGNED BYTE WITH SATURATION only overflow is possible, so only 
forcing X'FF is required. The low order byte B(0) needs no decrementing but if C(0)=l 
then X'FF' should be forced! This is done by setting F0(0)=1 and DB(0)=1 . Thus 
FO(0)=C(0) and DB(0)=C(0). The next byte B( 1 ) needs to be decremented if C(0)= 1 and 
needs to be forced to X'FF- ifC(l)»l. Thus F0(1)=C(1) and DB(1)=C(0) OR C(l). Note 
that it is acceptable to force X'FF" in the case where C(l)=l was caused by C(0)=1 

because the result ignoring C(0) must have been X"FF" anyway. So in general 

F0(n)=C(n) and DB(n)=C(n) OR C(n- 1 ). 

With regard to ADD UNSIGNED WORD WITH SATURATION, the low order word 
B(1)/B(0) does not need to be decremented but if C(l)=l then both bytes need to be 
forced to XTF-. Thus DB(l)=DB(0)=FO(0)=FO=(l)=C( l). The next word B(3>B(2, 
heeds to be decremented if C(l)=l and forced to X'FFFF" if C(3)=l. For B(2) then 
DB(2)=C(3) OR CO) and FO(2)=C(3). For B(3) then FO(3)=C(3) and DB(3)=C(3) OR 
CB(2). In general FO(n+l)-FO(n)=C(n+l) and DB(n)=C(n+l) OR C(n-l) and 
DB(n-r I )=C(n+ 1) OR CB(n) where n=2. 4. 6. 

ADD UNSIGNED DW/QW WITH SATUR.^TION follows a similar partem. 
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In unsigned addition C(n) represents overflow, while under flow could not happen. In 
signed addition underflow and overflow are possible so it is more complicated. Let 
OV(n)=C(n) NOT AND CI(n) represent overflow, while UV(n)=C(n) AND NOT CI(n) 
represent underflow and V(n)=UV(n) OR OV(n) represent some overflow/underflow 
condition. 

ADD SIGNED BYTE WITH SATURATION requires forcing X'7F or X'80'. The low 
order byte B(0) needs no decrementing but if OV(0)= 1 then B(0) needs to be forced to 
X'7F while if UV(0)=1 then B(0) needs to be forced to X'80'. Thus and X'7F' if 
OV(l)=l. Thus F8(1)=V(1) AND DB(l)=OV(l) OR C(0). In general F 8 (n)=V(n) and 
DB(n)=OV(n))RC(n-l). 

ADD SIGNED WORD WITH SATURATION adds complication. When detecting 
underflow or overflow the word must be forced to X-8000" and X'7FFF' respectively. 
Thus the low order byte must be forced to X'OO' and X'FF while the high order byte 
must be forced to X'80' and X"7F. The low order word B(1)/B(0) does not need to be 
decremented but if OV(l)=l then B(1)/B(0) needs to be forced to X*7FFF and if 
UV(l)=l then B(U/B(0) needs to be forced to X-8000'. Thus F8(l)=F0(0)=V( 1). 
DB(0)=DB(l )=OV( 1 ). The next word B3/B2 must be decremented if C(l )=l. forced to 
X'7FFF* if OV(3)=l. and forced to X"8000' if L'V(3)=l. Thus F)(3)=F0(2)=V(3). 

1 1 



DB(2)=OV(3) OR C(l) and DB(3)OV(3) OR DB(2). In general 
F««*l)fO(«)-V(«»l). DB(n)=)V(„-l) ORC(n), and DB(n-*- 1 )=0 V(n+ 1 ) OR DB(n) 
forn=2.4. 6. 

ADD SIGNED DW/QW WITH SATURATION follows a similar pattern. The 
SUBTRACT scenarios are the same as ADD except that instead of the correction 
bei ng decremented by one when carry in is one, it is incremented by one when carry in is 
zero (borrow is one). Underflow and overflow are defined the same. 

COMPARE FOR EQUAL and COMPARE FOR GREATER THAN take two operands 
in the signed B. W. DW. or QW length add perform a compare. The result field is set to 
all ones if true and all zeros if false. Given the ZB(n), C(n). and CI(n) signals it is tnvial 
t0 determine if the Byte, Word, Double word, or Quadword is EQUAL or GREATER 
THAN. The structure defined already permits forcing all zeros and all ones. 

• Referring to Figure 4, the carry logic can be generated in many ways know,, in the an. 
The figure implies a "ripple" arrangement but carry predict and carry look-ahead 
techniques (for instance) may be used within the scope of the invention. 
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<?HVTD Shift Instruction With Sta ndard Shifter 

As shown in Figure 1 a reservation station (si) 1 5 which holds the instruction to be 
executed, an execution unit 16 includes a SHIFTER (FIG.5), access to registers 12, and a 
finish stage (s2) 18 which holds results to be written when the instruction completes in 
5 the write-back stage. Figure 2 shows the present embodiment of the invention where the 

execution unit 16 is a two stage pipeline and SIMD correction is bypassed for non-SIMD 
instructions. This embodiment is more fully described above. The SIMD stage takes the 
result of the Shifter operation and corrects the sub-units to conform with the SIMD 
operation. 

1 0 In another implementation of the invention the MUX 26 could be eliminated, permitting a 

SIMD instruction to complete in the same cycle as a single cycle instruction following it. 

In yet another implementation of the present invention me correction stage 20 could be 
performed by the Shifter stage by feeding the result of the first pass of the Shifter stage 
back to the Shifter with the appropriate control hardware added. All of these 
1 5 implementations become obvious to one skilled in the art when we taught the present 

invention and are therefore encompassed by this disclosure. 
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Figure 3 shows the concept of the SIMD operation, a single S1MD inaction performs 
operations simultaneously on subsets of the regtster operands. Register (R2) 44 points to 
a 64 bit operand 12. The OP code specifies the operation (Rl < -R2 SHIFT instruction 
for instance). The SIMD instruction will perform independent SHIFTS on each subset. 
The subset could be a single byte such that 8 independent SHIFTS are performed by the 

instruction using each of the 8 bytes in the 64 bit register operand, or the subset could be 

16 or 32 bits. 

Figure 5 shows a high level view of a Shifter stage 60 of a preferred embodiment of the 
present invention. The shift Count (SCNT) 62. Operand 64 and Result regs 82 are part of 
the standard Shift Unit function for Non-StMD instructions. A Mask Generator 76 uses 
the Operand 64 and Shift Count 62 to generate a Mask Reg (MR) 80 for the AND-OR 
Mask (AOM) 84 in parallel with the standard shift result. The Mask generator, MR, and 
AOM are the correction circuit as depicted in Figs. 1 and 2. The result of the AOM 84 is 
latched in the final result (second stage) register. 

The details of the Mask Generator 76 are shown in Figure 6. The Left or Right shift 
indicator (L/R) 66 in conjunction with the Shift Count Register (SCNT) 62 creates a Shift 
Count Mask (SCNTM) 62. The Byte Shift Mask (BSM) 78 generates the shift mask for 
each byte of 
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Table 1. Byte Shift Mask (BSM) Function 
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SCNTM 


BSM (7:0) 


2 10 


765432 10 


000 


1 1 1 1 I 1 1 1 


0 0 1 


1 1 I I 1 1 I 0 


0 1 0 


1 1 I 1 1 1 00 


1 00 


11110000 


1 0 1 


11100000 


1 1 0 


1 1000000 


1 1 1 


I 0000000 



che 64 bit mask. This mask would be correct if it was a byte left shift with a count less 
than eight. ANDing the mask with the data produces the proper result for a shift left. 
Table 2 describes the equations for deriving when the shift count is greater than or equal 
to a certain number. For example SCT8 is the equation for SCNT greater than or equal to 
8. They are used in Table 3 to define the force XW (FOSn) function for shift left and the 
force x'ff (F 1 Sn) function for shift right. 
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Table 2. Shift Count Equations 

SCT8 =S3+S4-S5+S6 

SCT16 =S4-S5^S6 

SCT24 =S5-S6+(S3 A S4) 

SCT32 =S5+S6 

SCT40 =S6-S5 A (S3 + S4) 

SCT48 =S6- (S5 A S4) 

SCT56 =S5 A S4 A S3 

SCT64 =S6 
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Table 3. SHIFT LEFT AND SHIFT RIGHT 



LEFT 
RIGHT 


FOS7 
F1S7 


FOS6 
r 1 oj 


FOS5 


FOS4 


FOS3 
F1S3 


FOS2 
F1S2 


FOS1 
F1S1 


FOSO 
F1S1 


BYTE LEFT 


0 


0 


0 


0 


0 


0 


0 


0 


RIGHT 


0 


0 


0 


0 


0 


0 


0 


0 


WORD LEFT 
RIGHT 


SCT16 
SCT 8 


SCT8 
SCT 16 


SCT 16 
SCT8 


SCT8 
SCT 16 


SCT 16 
SCT8 


SCT8 
SCT16 


SCT 16 
SCT8 


SCT8 
SCT16 


DWORD LEFT 
RIGHT 


SCT32 
SCT8 


SCT24 
SCT 16 


SCT 16 
SCT24 


SCT8 
SCT32 


SCT32 
SCT8 


SCT24 
SCT16 


SCT16 
SCT24 


SCT8 
SCT32 


QWORD LEFT 
RIGHT 


SCT64 
SCT8 


SCT56 
SCT 16 


SCT48 
SCT24 


SCT40 
SCT32 


SCT32 
SCT40 


SCT24 
SCT48 


SCT16 SCT8 
SCT56 SCT64 



At this point when the mask in the MR is AND/OR red with the 64 bit shifted data in the 
result register according to the function code defined in figure 7 the proper result is 
generated. The mask generated for shift left has ones where the data should be preserved 
and zeroes where it should be zeroed out. For shift right zeroes indicate the data should 
be preserved and ones indicate that zeroes or ones should be padded depending upon the 
kind of shift (arithmetic or logical) and the appropriate high order bit. 

Table 4 shows the Generation of the function field which goes to the AOM. Referring to 
Figure 7. the AOM receives the MR and Result register for each byte and performs the 
function indicated in Table 5 based on the FN(X) field. 
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Operation: 



Table 4. FN(X) GENERATION 1 

FN7 FN6 FN 5 FN4 FN 3 FN2 FN1 FNO 



SL (B t WJ5W,QW) 

SRJL (B,W.DW f QW) 

SRA(B)* 

SRA(W)* 

SRA(DW)* 

SRA(QW)* 



01 

10 

R63 

R63 

R63 

R63 



01 

10 

R55 

R63 

R63 

R63 



01 

10 

R47 

R47 

R63 

R63 



01 

10 

R39 

R47 

R63 

R63 



01 . 

10 

R31 

R31 

R31 

R63 



01 

10 

R23 

R31 

R31 

R63 



01 

10 

R15 

R15 

R31 

R63 



01 

10 

R7 

RIS 

R31 

R63 



SL=SHIFT LEFT 

SRL=SHIFT RIGHT LOGICAL 

SRA=SHIFT RIGHT ARITHMETIC 

B-BYTE 

\V=2 BYTES 

DW=4 BYTES 

QW=8 BYTES 



♦For SRA. the two bit FN code is a "1" concatenated with R(y) where y is a bit position 
(i.e.. R15 is bit 15). 



Table 5. AOM FUNCTIONS 



FN (1:0) 



0 1 

1 0 
1 1 



FUNCTION 
R AND MR 



R AND MRnot 
R OR MR 



DESCRIPTION - 
SHIFT LEFT 

SHIFT RIGHT W/ZEROES 
SHIFT RIGHT W/ONES 
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CLAIMS 

1 . A microprocessor circuit comprising: 

a) , arithmetic/shift function performing standard operations on instructions in a 

first cycle of operation; and 

b) . a correction circuit responsive to said arithmetic/shift function for modifying 
the standard results provided by said arithmetic/shift function to results required 
bv a Single Instruction Multiple Data (SIMD) instruction being executed. 



2. The circuit 
second cycle. 



of claim I. wherein the correction circuit modification is performed in a 



3. The circuit of claim 2, wherein the arithmetic/shift function is free to execute a second 
instruction in the second cycle. 

4. The circuit of claim I. wherein the arithmetic/shift function and the correction circuit 
can each complete an instruction in the same cycle. 

5. The circuit of claim 1. wherein the arithmetic shift function is an instruction provided 
bv an Arithmetic Logic Unit (ALU). , 
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6. The circuit of claim 1 , wherein the correction circuit passes data, unchanged for logical 
instructions but provides condition codes according to the SIMD instruction. 

7. The circuit of claim 5, wherein the correction circuit corrects arithmetic operations by 
operating on standard data based on significant bits and earn- bits for sub-unit boundaries. 

8. The circuit of claim 5. wherein the standard ALU includes an ADD operation. 

9. The circuit of claim I. wherein the arithmetic/shift function is a SHIFT instruction. 

10. A microprocessor circuit for executing Multi-Media instructions comprising: 

a) , an .Arithmetic Logic Unit (ALU) performing standard operations on 
instructions in a first cycle of operation; and 

b) . a correction circuit responsive to the ALU for modifying the standard results 
provided by the ALU to results required by Single Instruction Multiple Data _ 
(SIMD). 

1 1 The circuit of claim 1 . wherein the correction circuit modification is performed in a 
second cvcle. 
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12. The circuit of claim 2, wherein the ALU is free to execute a second instruction in the 
second cycle. 

1 3. The circuit of claim I . wherein the ALU and correction circuit can each complete an 
5 instruction in the same cycle. 

14. A microprocessor circuit comprising: 

a) , a Shifter performing standard operations on instructions in a first cycle of 

operation: 

b) . a mask generating circuit in parallel with the Shifter circuit: and 

1 0 c). a correction circuit responsive to said Shifter and said mask generating circuit 

for modifying the standard results provided by said Shifter to results required by a 
SIMD SHIFT instruction being executed. 

1 5. The circuit of claim 1 3, wherein the modification is performed in a second cycle. 
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16. The circuit of claim 14, wherein the modification is performed by an address overlay 
mask circuit. 
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17. The circuit of claim 14 ? wherein the Shifter is free to execute a second instruction in 
the second cycle. 

18. The circuit of claim 13, wherein the Shifter and correction circuit can each complete 
an instruction in the same cycle. 

19. The circuit of claim 13. wherein the Shifter performs a SHIFT operation. 

20. A microprocessor circuit substantially as hereinbefore described with reference to 
and as shown in the accompanying drawings. 
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